A comprehensive evaluation of alignment software for reduced representation bisulfite sequencing data.
نویسندگان
چکیده
Motivation The rapid development of next-generation sequencing technology provides an opportunity to study genome-wide DNA methylation at single-base resolution. However, depletion of unmethylated cytosines brings challenges for aligning bisulfite-converted sequencing reads to a large reference. Software tools for aligning methylation reads have not yet been comprehensively evaluated, especially for the widely used reduced representation bisulfite sequencing (RRBS) that involves enrichment for CpG islands (CGIs). Results We specially developed a simulator, RRBSsim, for benchmarking analysis of RRBS data. We performed extensive comparison of seven mapping algorithms for methylation analysis in both real and simulated RRBS data. 18 lung tumors and matched adjacent tissues were sequenced by the RRBS protocols. Our empirical evaluation found that methylation results were less consistent between software tools for CpG sites with low sequencing depth, medium methylation level, on CGI shores or gene body. These observations were further confirmed by simulations that indicated software tools generally had lower recall of detecting these vulnerable CpG sites and lower precision of estimating methylation levels in these CpG sites. Conclusion Among the software tools tested, bwa-meth and BS-Seeker2 (bowtie2) are currently our preferred aligners for RRBS data in terms of recall, precision, and speed. Existing aligners cannot efficiently handle moderately methylated CpG sites and those CpG sites on CGI shores or gene body. Interpretation of methylation results from these vulnerable CpG sites should be treated with caution. Our study reveals several important features inherent in methylation data, and RRBSsim provides guidance to advance sequence-based methylation data analysis and methodological development. Availability RRBSsim is a simulator for benchmarking analysis of RRBS data and its source code is available at https://github.com/xwBio/RRBSsim orhttps://github.com/xwBio/Docker-RRBSsim. Contact [email protected] and [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.
منابع مشابه
SAAP-RRBS: streamlined analysis and annotation pipeline for reduced representation bisulfite sequencing
UNLABELLED Reduced representation bisulfite sequencing (RRBS) is a cost-effective approach for genome-wide methylation pattern profiling. Analyzing RRBS sequencing data is challenging and specialized alignment/mapping programs are needed. Although such programs have been developed, a comprehensive solution that provides researchers with good quality and analyzable data is still lacking. To addr...
متن کاملRRBSMAP: a fast, accurate and user-friendly alignment tool for reduced representation bisulfite sequencing
SUMMARY Reduced representation bisulfite sequencing (RRBS) is a powerful yet cost-efficient method for studying DNA methylation on a genomic scale. RRBS involves restriction-enzyme digestion, bisulfite conversion and size selection, resulting in DNA sequencing data that require special bioinformatic handling. Here, we describe RRBSMAP, a short-read alignment tool that is designed for handling R...
متن کاملRnBeads – Comprehensive Analysis of DNA Methylation Data
RnBeads is an R package for the comprehensive analysis of genome-wide DNA methylation data with single basepair resolution. Supported assays include the Infinium 450k microarray, whole genome bisulfite sequencing (WGBS), reduced representation bisulfite sequencing (RRBS), other forms of enrichment bisulfite sequencing, and any other large-scale method that can provide DNA methylation data at si...
متن کاملDetection of significantly differentially methylated regions in targeted bisulfite sequencing data
MOTIVATION Bisulfite sequencing is currently the gold standard to obtain genome-wide DNA methylation profiles in eukaryotes. In contrast to the rapid development of appropriate pre-processing and alignment software, methods for analyzing the resulting methylation profiles are relatively limited so far. For instance, an appropriate pipeline to detect DNA methylation differences between cancer an...
متن کاملGBSA: a comprehensive software for analysing whole genome bisulfite sequencing data
High-throughput sequencing is increasingly being used in combination with bisulfite (BS) assays to study DNA methylation at nucleotide resolution. Although several programmes provide genome-wide alignment of BS-treated reads, the resulting information is not readily interpretable and often requires further bioinformatic steps for meaningful analysis. Current post-alignment BS-sequencing program...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره شماره
صفحات -
تاریخ انتشار 2018